A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
This report has been generated by the nf-core/sarek analysis pipeline. For information about how to interpret these results, please see the documentation.
Report
generated on 2025-07-08, 12:41 EDT
based on data in:
/sc/arion/projects/NGSCRC/Work/Seqera/ba/fb9db1d6241bd7fe6c1c4fa100fcdb
General Statistics
Showing 1504 samples.
Bcftools
Utilities for variant calling and manipulating VCFs and BCFs.URL: https://samtools.github.io/bcftoolsDOI: 10.1093/gigascience/giab008
Variant Substitution Types
Variant Quality
Indel Distribution
Variant depths
Read depth support distribution for called variants
Vcftools
Program to analyse and reporting on VCF files.URL: https://vcftools.github.ioDOI: 10.1093/bioinformatics/btr330
TsTv by Count
Plot of TSTV-BY-COUNT - the transition to transversion ratio as a function of alternative allele count from the output of vcftools TsTv-by-count.
Transition is a purine-to-purine or pyrimidine-to-pyrimidine point mutations.
Transversion is a purine-to-pyrimidine or pyrimidine-to-purine point mutation.
Alternative allele count is the number of alternative alleles at the site.
Note: only bi-allelic SNPs are used (multi-allelic sites and INDELs are skipped.)
Refer to Vcftools's manual (https://vcftools.github.io/man_latest.html) on --TsTv-by-count
TsTv by Qual
Plot of TSTV-BY-QUAL - the transition to transversion ratio as a function of SNP quality from the output of vcftools TsTv-by-qual.
Transition is a purine-to-purine or pyrimidine-to-pyrimidine point mutations.
Transversion is a purine-to-pyrimidine or pyrimidine-to-purine point mutation.
Quality here is the Phred-scaled quality score as given in the QUAL column of VCF.
Note: only bi-allelic SNPs are used (multi-allelic sites and INDELs are skipped.)
Refer to Vcftools's manual (https://vcftools.github.io/man_latest.html) on --TsTv-by-qual
SNPeff
Annotates and predicts the effects of variants on genes (such as amino acid changes).URL: http://snpeff.sourceforge.netDOI: 10.4161/fly.19695
Variants by Genomic Region
The stacked bar plot shows locations of detected variants in the genome and the number of variants for each location.
The upstream and downstream interval size to detect these genomic regions is 5000bp by default.
Variant Effects by Impact
The stacked bar plot shows the putative impact of detected variants and the number of variants for each impact.
There are four levels of impacts predicted by SnpEff:
- High: High impact (like stop codon)
- Moderate: Middle impact (like same type of amino acid substitution)
- Low: Low impact (ie silence mutation)
- Modifier: No impact
Variants by Effect Types
The stacked bar plot shows the effect of variants at protein level and the number of variants for each effect type.
This plot shows the effect of variants with respect to the mRNA.
Variants by Functional Class
The stacked bar plot shows the effect of variants and the number of variants for each effect type.
This plot shows the effect of variants on the translation of the mRNA as protein. There are three possible cases:
- Silent: The amino acid does not change.
- Missense: The amino acid is different.
- Nonsense: The variant generates a stop codon.
Variant Qualities
The line plot shows the quantity as function of the variant quality score.
The quality score corresponds to the QUAL column of the VCF file. This score is set by the variant caller.
VEP
Determines the effect of variants on genes, transcripts and protein sequences, as well as regulatory regions.URL: https://www.ensembl.org/info/docs/tools/vep/index.htmlDOI: 10.1186/s13059-016-0974-4
General Statistics
Table showing general statistics of VEP annotation run
Showing 1504 samples.
Variant classes
Classes of variants found in the data.
Consequences
Predicted consequences of variations.
SIFT summary
SIFT variant effect prediction.
PolyPhen summary
PolyPhen variant effect prediction.
Variants by chromosome
Number of variants found on each chromosome.
Position in protein
Relative position of affected amino acids in protein.
Software Versions
Software Versions lists versions of software tools extracted from file contents.
| Group | Software | Version |
|---|---|---|
| ASCAT | alleleCounter | 4.3.0 |
| ascat | 3.1.1 | |
| ASSESS_SIGNIFICANCE | controlfreec | 11.6 |
| BCFTOOLS_SORT | bcftools | 1.2 |
| BCFTOOLS_STATS | bcftools | 1.2 |
| CALCULATECONTAMINATION | gatk4 | 4.5.0.0 |
| CNNSCOREVARIANTS | gatk4 | 4.5.0.0 |
| CNVKIT_BATCH | cnvkit | 0.9.10 |
| samtools | 1.17 | |
| CNVKIT_CALL | cnvkit | 0.9.10 |
| CNVKIT_EXPORT | cnvkit | 0.9.10 |
| CNVKIT_GENEMETRICS | cnvkit | 0.9.10 |
| ENSEMBLVEP_VEP | ensemblvep | 113.0 |
| FILTERMUTECTCALLS | gatk4 | 4.5.0.0 |
| FILTERVARIANTTRANCHES | gatk4 | 4.5.0.0 |
| FREEBAYES | freebayes | 1.3.6 |
| FREEC2BED | controlfreec | 11.6b |
| FREEC2CIRCOS | controlfreec | 11.6b |
| FREEC_SOMATIC | controlfreec | 11.6b |
| FREEC_TUMORONLY | controlfreec | 11.6b |
| GATK4_HAPLOTYPECALLER | gatk4 | 4.5.0.0 |
| GETPILEUPSUMMARIES | gatk4 | 4.5.0.0 |
| GETPILEUPSUMMARIES_NORMAL | gatk4 | 4.5.0.0 |
| GETPILEUPSUMMARIES_TUMOR | gatk4 | 4.5.0.0 |
| LEARNREADORIENTATIONMODEL | gatk4 | 4.5.0.0 |
| MAKEGRAPH2 | controlfreec | 11.6b |
| MSISENSORPRO_MSISOMATIC | msisensor-pro | 1.2.0 |
| MUTECT2 | gatk4 | 4.5.0.0 |
| MUTECT2_PAIRED | gatk4 | 4.5.0.0 |
| SAMTOOLS_MPILEUP | samtools | 1.21 |
| SNPEFF_SNPEFF | snpeff | 5.1d |
| STRELKA_SINGLE | strelka | 2.9.10 |
| STRELKA_SOMATIC | strelka | 2.9.10 |
| SVDB_MERGE | bcftools | 1.21 |
| svdb | 2.8.2 | |
| TABIX_BGZIPTABIX | tabix | 1.2 |
| TABIX_BGZIP_TIDDIT_SV | tabix | 1.2 |
| TABIX_TABIX | tabix | 1.2 |
| TABIX_VC_FREEBAYES | tabix | 1.2 |
| TIDDIT_SV | tiddit | 3.6.1 |
| VCFTOOLS_TSTV_COUNT | vcftools | 0.1.16 |
| Workflow | Nextflow | 24.04.4 |
| nf-core/sarek | v3.5.0-gae4dd11 |
nf-core/sarek Methods Description
Suggested text and references to use when describing pipeline usage within the methods section of a publication.URL: https://github.com/nf-core/sarek
Methods
Data was processed using nf-core/sarek v3.5.0 (doi: 10.12688/f1000research.16665.2), (doi: 10.1093/nargab/lqae031), (doi: 10.5281/zenodo.3476425) of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.
The pipeline was executed with Nextflow v24.04.4 (Di Tommaso et al., 2017) with the following command:
nextflow run 'https://github.com/nf-core/sarek' -name NRG-GY003_crams_7 -params-file 'https://api.cloud.seqera.io/ephemeral/d_hgo10DXjlnM1AB_iAS8w.json' -with-tower 'https://api.cloud.seqera.io' -r ae4dd11acc8b7e13fd6d4d45a92ff29a8e2b958d -profile singularity -resume aa0889b3-2262-4c90-8eff-62054a9be265
References
- Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
- Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
- Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
- da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192
Notes:
- The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
- You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.
nf-core/sarek Workflow Summary
- this information is collected when the pipeline is started.URL: https://github.com/nf-core/sarek
Input/output options
- input
- https://api.cloud.seqera.io/workspaces/26890372228482/datasets/3GfOwU6YBVYUh66oxmoIth/v/1/n/NRG-GY003_WES_cram.csv
- outdir
- /sc/arion/projects/NGSCRC/Work/Seqera/NRG-GY003_All_Varcalls
- step
- variant_calling
Main options
- intervals
- /sc/arion/projects/NGSCRC/Resources/Twist_Bioscience_Comprehensive_Exome_Targets/Twist_Comprehensive_Exome_Covered_Targets_hg38.sorted.padded.merged.bed
- tools
- freebayes,tiddit,cnvkit,ascat,msisensorpro,mutect2,controlfreec,haplotypecaller,strelka,snpeff,vep,merge
- wes
- true
Variant Calling
- cf_chrom_len
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/Length/Homo_sapiens_assembly38.len
- joint_mutect2
- true
- pon
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz
- pon_tbi
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/1000g_pon.hg38.vcf.gz.tbi
Annotation
- vep_version
- 113.0-0
Reference genome options
- ascat_alleles
- /sc/arion/projects/NGSCRC/Resources/ASCAT_WES/battenberg_alleles_on_target_hg38.zip
- ascat_genome
- hg38
- ascat_loci
- /sc/arion/projects/NGSCRC/Resources/ASCAT_WES/battenberg_loci_on_target_hg38.zip
- ascat_loci_gc
- /sc/arion/projects/NGSCRC/Resources/ASCAT_WES/GC_G1000_on_target_hg38.zip
- ascat_loci_rt
- /sc/arion/projects/NGSCRC/Resources/ASCAT_WES/RT_G1000_on_target_hg38.zip
- bwa
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/BWAIndex/
- bwamem2
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/BWAmem2Index/
- chr_dir
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/Chromosomes
- dbsnp
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.dbsnp138.vcf
- dbsnp_tbi
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.dbsnp138.vcf.idx
- dbsnp_vqsr
- --resource:dbsnp,known=false,training=true,truth=false,prior=2.0 dbsnp_146.hg38.vcf.gz
- dict
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38.dict
- dragmap
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Sequence/dragmap/
- fasta
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.fasta
- fasta_fai
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.fasta.fai
- germline_resource
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz
- germline_resource_tbi
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/GATKBundle/af-only-gnomad.hg38.vcf.gz.tbi
- known_indels
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.known_indels.vcf.gz
- known_indels_tbi
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
- known_indels_vqsr
- --resource:gatk,known=false,training=true,truth=true,prior=10.0 Homo_sapiens_assembly38.known_indels.vcf.gz --resource:mills,known=false,training=true,truth=true,prior=10.0 Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
- known_snps
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/1000G_phase1.snps.high_confidence.hg38.vcf.gz
- known_snps_tbi
- /sc/arion/projects/NGSCRC/Resources/gatk_hg38/gatk_bundle_2024/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
- known_snps_vqsr
- --resource:1000G,known=false,training=true,truth=true,prior=10.0 1000G_omni2.5.hg38.vcf.gz
- mappability
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/Control-FREEC/out100m2_hg38.gem
- ngscheckmate_bed
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/NGSCheckMate/SNP_GRCh38_hg38_wChr.bed
- sentieon_dnascope_model
- s3://ngi-igenomes/igenomes//Homo_sapiens/GATK/GRCh38/Annotation/Sentieon/SentieonDNAscopeModel1.1.model
- snpeff_cache
- /sc/arion/projects/NGSCRC/Resources/SnpEff
- snpeff_db
- GRCh38.105
- vep_cache
- /sc/arion/projects/NGSCRC/Resources/VEP
- vep_cache_version
- 113
- vep_genome
- GRCh38
- vep_species
- homo_sapiens
Core Nextflow options
- configFiles
- N/A
- containerEngine
- singularity
- launchDir
- /sc/arion/projects/NGSCRC/Work/Seqera
- profile
- singularity
- projectDir
- /sc/arion/projects/NGSCRC/Work/Seqera/.nextflow/pipelines/f83b9b1a/nf-core/sarek
- revision
- 3.5.0
- runName
- NRG-GY003_crams_7
- userName
- monsok03
- workDir
- /sc/arion/projects/NGSCRC/Work/Seqera